street sign
VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments
Yang, Bufang, He, Lixing, Liu, Kaiwei, Yan, Zhenyu
Individuals with visual impairments, encompassing both partial and total difficulties in visual perception, are referred to as visually impaired (VI) people. An estimated 2.2 billion individuals worldwide are affected by visual impairments. Recent advancements in multi-modal large language models (MLLMs) have showcased their extraordinary capabilities across various domains. It is desirable to help VI individuals with MLLMs' great capabilities of visual understanding and reasoning. However, it is challenging for VI people to use MLLMs due to the difficulties in capturing the desirable images to fulfill their daily requests. For example, the target object is not fully or partially placed in the image. This paper explores how to leverage MLLMs for VI individuals to provide visual-question answers. VIAssist can identify undesired images and provide detailed actions. Finally, VIAssist can provide reliable answers to users' queries based on the images. Our results show that VIAssist provides +0.21 and +0.31 higher BERTScore and ROUGE scores than the baseline, respectively.
Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
Bianco, Simone, Celona, Luigi, Donzella, Marco, Napoletano, Paolo
State-of-The-Art (SoTA) image captioning models often rely on the Microsoft COCO (MS-COCO) dataset for training. This dataset contains annotations provided by human annotators, who typically produce captions averaging around ten tokens. However, this constraint presents a challenge in effectively capturing complex scenes and conveying detailed information. Furthermore, captioning models tend to exhibit bias towards the ``average'' caption, which captures only the more general aspects. What would happen if we were able to automatically generate longer captions, thereby making them more detailed? Would these captions, evaluated by humans, be more or less representative of the image content compared to the original MS-COCO captions? In this paper, we present a novel approach to address previous challenges by showcasing how captions generated from different SoTA models can be effectively fused, resulting in richer captions. Our proposed method leverages existing models from the literature, eliminating the need for additional training. Instead, it utilizes an image-text based metric to rank the captions generated by SoTA models for a given image. Subsequently, the top two captions are fused using a Large Language Model (LLM). Experimental results demonstrate the effectiveness of our approach, as the captions generated by our model exhibit higher consistency with human judgment when evaluated on the MS-COCO test set. By combining the strengths of various SoTA models, our method enhances the quality and appeal of image captions, bridging the gap between automated systems and the rich, informative nature of human-generated descriptions. This advance opens up new possibilities for generating captions that are more suitable for the training of both vision-language and captioning models.
La veille de la cybersécurité
The first job for many artificial intelligence (AI) algorithms is to examine the data and find the best classification. An autonomous car, for example, may take an image of a street sign; the classification algorithm must interpret the street sign by reading any words and comparing it to a list of known shapes and sizes. A phone must listen to a sound and determine whether it is one of its wake-up commands ("Alexa," "Siri," "Hey Google"). The job of classification is sometimes the ultimate goal of an algorithm. Many data scientists use AI algorithms to preprocess their data and assign categories.
What is artificial intelligence classification?
We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. The first job for many artificial intelligence (AI) algorithms is to examine the data and find the best classification. An autonomous car, for example, may take an image of a street sign; the classification algorithm must interpret the street sign by reading any words and comparing it to a list of known shapes and sizes. A phone must listen to a sound and determine whether it is one of its wake-up commands ("Alexa," "Siri," "Hey Google"). The job of classification is sometimes the ultimate goal of an algorithm.
Artificial intelligence drives next-generation street sign
Smartphones and GPS have made paper maps virtually obsolete and put the power of navigation in our pockets. But now, engineers are working on a high-tech update for another directional tool that could revolutionize how we find our way around. The first street signs date back hundreds of years. They help you figure out where you are and where you're going. But what if they could be updated throughout the day, hour by hour to keep you informed about what's happening around you? "This is a fully-functioning street sign that allows you to essentially market, advertise and communicate out to the public," Michael Ottoman said, showing off a hi-tech version of the old street sign.
The cybersecurity battle of the future – AI vs. AI
Artificial intelligence and machine learning continue to gain a foothold in our everyday lives. Whether for complex tasks like computer vision and natural language processing, or something as basic as an online chatbot, their popularity shows no signs of slowing. Companies have also started to explore deep learning, which is an advanced subset of machine learning. By applying "deep neural networks" deep learning takes inspiration from how the human brain works. Unlike machine learning, deep learning can actually train its processes directly on raw data, requiring little to no human intervention.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.51)
Three People-Centered Design Principles for Deep Learning
Bad data and poorly designed AI systems can lead you to spurious conclusions and hurt customers, your products, and your brand. This article is part of an MIT SMR initiative exploring how technology is reshaping the practice of management. Over the past decade, organizations have begun to rely on an ever-growing number of algorithms to assist in making a wide range of business decisions, from delivery logistics, airline route planning, and risk detection to financial fraud detection and image recognition. We're seeing the end of the second wave of AI, which began several decades ago with the introduction of rule-based expert systems, and moving into a new, third wave, termed perception AI. It's in this next wave where a specific subset of AI, called deep learning, will play an even more critical role.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
No Time Like Now to Leverage AI - TEK2day
In deploying artificial intelligence ("AI") or one of its sibling technologies – machine learning and deep learning – the first order of business is defining the business problem. Next, understand your enterprise data and third party data in terms of scope and quality. Once those elements are in place, you are ready to embark on your AI journey upon which your imagination will be the primary limiting factor. These are problems that can be answered by deploying some combination of AI, machine learning, deep learning/neural networks and/or natural language processing ("NLP"). Data quality is important – "garbage in, garbage out".
- Leisure & Entertainment (0.54)
- Transportation (0.36)
- Media > Film (0.34)
Adversarial camera stickers: A physical camera-based attack on deep learning systems
Li, Juncheng, Schmidt, Frank R., Kolter, J. Zico
Recent work has thoroughly documented the susceptibility of deep learning systems to adversarial examples, but most such instances directly manipulate the digital input to a classifier. Although a smaller line of work considers physical adversarial attacks, in all cases these involve manipulating the object of interest, e.g., putting a physical sticker on a object to misclassify it, or manufacturing an object specifically intended to be misclassified. In this work, we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself? We show that this is indeed possible, that by placing a carefully crafted and mainly-translucent sticker over the lens of a camera, one can create universal perturbations of the observed images that are inconspicuous, yet reliably misclassify target objects as a different (targeted) class. To accomplish this, we propose an iterative procedure for both updating the attack perturbation (to make it adversarial for a given classifier), and the threat model itself (to ensure it is physically realizable). For example, we show that we can achieve physically-realizable attacks that fool ImageNet classifiers in a targeted fashion 49.6% of the time. This presents a new class of physically-realizable threat models to consider in the context of adversarially robust machine learning. Our demo video can be viewed at: https://youtu.be/wUVmL33Fx54
Google's reCAPTCHA test has been tricked by artificial intelligence
Computer scientists have found a way around Google's reCAPTCHA tests, tricking the system into thinking an artificial intelligence program is human. But there's a catch, although the AI system can fool the bot test it doesn't live-up to the promise its creators give it. CAPTCHAs are the tests used by websites to battle back against bots, asking website visitors to prove they're human before proceeding. The leading system is Google's reCAPTCHA, which has previously asked website visitors to prove their humanity by checking words scanned from books or photographs of street signs. That was replaced with behavioural analysis, requiring humans to simply tick a box proclaiming "I'm not a robot".
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States > Illinois (0.05)
- Europe > United Kingdom > England > Dorset > Bournemouth (0.05)
- Asia > China (0.05)